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Herein we study the problem of assessing, on the basis of noisy and incomplete observations, how much 
information there is in the data for model identification in compartmental systems. The underlying concept 
is that of an "information distance" between competing models, and estimation of this distance on the basis 
of the given data is discussed. Useful reduction of the dimensionality of the corresponding least squares 
problem is accomplished by regarding the decay rate constants as primary parameters of interest and the other 
parameters of the model as nuisance parameters. Estimation of the decay rate function is also discussed. 
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1. Introduction 

Compartmental models are widely used in many Fields 
of science and engineering — pharmacokinetics, bio- 
chemistry, physiology, radioactive isotopes and tracers, 
to name a few. The basic equations of a typical (linear) 
compartmental model with k compartments are 



dxt/dt =Cjo+ 2 (eg Xj — Cji xi)— c 0i Xj , 



(1) 



0<* < oo , / = 1 , ...,k ; where c$ are nonnegative constants 
(called the "turnover rate constants"), compartment 
denotes the environment, and x, =x,(t) is the amount of 
material in compartment i at time t. Letting / =c, and 
fi/^CijXj for y#0, fij represents the mass flow rate to 
compartment i from compartment,/. Under certain as- 
sumptions (cf. [3], [12]) 1 , integration of differential eq (1) 
leads to 






(2) 
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i = \, ..., k; where \ it ..,, \ k are positive constants 
(called the "decay rate constants") depending only on 
the turnover rate constants c, ? , and j3, and ct,y are non- 
negative constants. 

Noting that it is often not possible to have accurate 
and complete measurements from all the compartments, 
Berman and Schoenfeld [2] considered the degrees of 
freedom in choosing a compartmental model compatible 
with the data when measurements are incomplete. In 
particular, when one can only measure aggregate input- 
output characteristics so that one observes only 
x(t)=1Xj(t), then one can only identify the rate 
constants X u ..., X k of the model (2) from the equation 



x(t)=B + 2,Aje- 

y'=i 



(3) 



where B = 2 /3, and Aj =2a ( , In pharmacokinetic ap- 

i= i t= i 

plications, Wagner [10] has shown that many basic 

quantities of interest can be expressed in terms of the 
parameters of the reduced model (3), the use of which 
he recommends in lieu of the full model (2) whose spec- 
ification usually leads to ambiguities in these applica- 
tions because of noisy and incomplete measurements. 
The difficulties in model formulation and identi- 
fication are compounded when the measurements are 
not only incomplete but also are subject to error. This 
leads to the question of assessing, on the basis of noisy 
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and incomplete observations, how much information 
there is in the data for model identification. If the 
amount of information is found to be inadequate for 
meaningful specification of the full model, then it may 
be more useful to work with a reduced model or directly 
with certain scientifically important characteristics 
(functions of the parameters) of the model. 

Since both eqs (2) (for the individual compartments) 
and (3) (for the whole system) express the response as a 
poly exponential function of time t, the statistical prob- 
lems in the present context are basically those of param- 
eter estimation for polyexponential regression models. 
We discuss herein 1) the information content in the data 
from these regression models, and 2) estimation of 
model parameters and certain functions thereof. 



2. Noisy Data and Information Content 

Consider the polyexponential regression model 

yi~p-h*£ctj exp(-'hjt i )+<: i , i = \, ..,, n, (4) 

where the e, represent random errors. The errors e, are 
usually assumed to be independent random variables 
with zero means. Letting 9 = (k u .,., \ k ; a u ..., a k , j9) 
and 

Mi)=fi + a t e~ x "+,.. + a k e- ,l " r 

common models for Var(£,) are: 

(i) Var(e,) = o- 2 (constant variance error model), 
(ii) Var(e,)=yi((//)o- 2 (constant coefficient of 

variation model), 
(Hi) Varfe^^yjt^Jo- 2 (Poisson-type error model). 

Statistical methods for estimating the unknown pa- 
rameters of the regression function try to "average out" 
the random errors in various ways. Assuming that ob- 
servations are taken at equally spaced times /,=(;' — 1)A 
and that n=(2k + l)m, the method of Lanczos 
[6, p. 273] and Cornell [4] uses the sample means 



Since the y r are strongly consistent estimates of it,, it 
follows that the Cornell- Lanczos method is consistent, 
as was established by Cornell [4]. However, Lanczos [6] 
gave an example to demonstrate "surprising numerical 
snags which may develop on account of the exceedingly 
nonorthogonal behavior of exponential functions." The 
true regression function in Lanczos' example is 

/ a (O=0.0951 e -'+0.8607 e ~ 3 '+ 1.5576 e" 5 '. (5a) 

On the basis of 24 successive decay observations in the 

time interval <0,1.2] of the form 2.51, 2.04 0.07, 0.06, 

which are accurate up to 1/2 unit of the second decimal, 
and assumming knowledge of /3=0 and k — Z, the pre- 
ceding method gives the fitted model 



/ 8 (/)=0 + 0.305 <?-'- 53r +2.202 e - 



(5b) 



Although the true parameters are disappointingly far 
from their estimates, the fitted function (5b) is remark- 
ably close to the true function (5a), and one cannot 
distinguish between the two models within the errors of 
the measurements. 

An alternative method to estimate the unknown pa- 
rameters is that of least squares. The estimate 9 of $ is 
the value of y that minimizes 

S(y)=iw,[;* -/,<*,)]*, (6) 

where the w t are suitably chosen weights. Note that 



zkwtlMtd-Mtfttt. (7) 

Moreover, if the weights w t are so chosen that w t Var (e,) 
are bounded, then 



o&w l [Mt i )-f v (t i )] 1 ) W .p.l (8) 



y,=m- i 2 y f (r = l 2k + 1) 

l={r-l)m 



to estimate the moving averages tx r =m ' 2 fofa) of 

the regression function. Replacing /*, — ju, r _i by y r — j>,_, 
in Prony's algebraic equation defining ^=e"* , ' Dl4 , solu- 
tion of the algebraic equation then gives the estimate of 
X,. This method therefore tries to average out the ran- 
dom errors by introducing the new parameters \i r and 
using the sample means y, to estimate jx,. 



(with probability 1) as 



d{6,y) =2>v,[/ y (A) ~Mh)f^ » . 



(9) 



The quantity d(S,y)(=E{S(y)-S(d)}) defined in (9) is 
a measure of the separation (distance) between y and the 
parameter vector 6 reflected by the data. When the €,- 
are normal jV(0,l/w r ) random variables, 1/2 d(Q,y) is 
the Kullback-Leibler information number and the least 
squares estimate coincides with the maximum 
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likelihood estimate. Thus, the least squares method av- 
erages out the errors e, in the weighted sums 

2w, \fe(t<)—f y {tj)]€.;/d(6,y) for all choices of y under 
consideration. Consistency of this method under as- 
sumption (9) is therefore an immediate consequence of 
(7) and (8) if there are only finitely many such choices 
among which one is the true parameter vector, and its 
extension to infinite parameter spaces involves addi- 
tional compactness arguments (cf. [7], [11]). 

Since f y (t) is linear in the parameters ft, a, a k , 

least squares estimates of these "linear" parameters are 
given by the standard formulas in multiple linear re- 
gression theory for every fixed X = (X l3 .. ., \ k ). For fixed 
X, define 



p,a.i,.,.,a, k i=\ 



(10) 



and the original least squares problem is reduced to that 
of minimizing S*(X) involving only the decay rate con- 
stants X;, . . ., X* . This approach, suggested by Golub and 
Pereyra [5] and Osborne [8], has the great advantage of 
reducing the dimensionality of the parameter vector 
from 2k -f 1 to k. Using this approach to fit a two- 
compartment model to the Lanczos data, Varah [9] has 
recently shown that S*(X],X 2 ) (with equal weights) has 
ill-conditioned Hessian matrices and is very flat over a 
broad region containing the minimum. 

We now show how a global analysis of the least 
squares function S*(X) enables us to assess how much 
information there is in the data to specify the model. Not 
only does such analysis provide a relatively stable nu- 
merical algorithm for finding the least squares estimates 
of the model parameters, but it also sheds light on the 
range of models that are compatible with the data. 

To fix the ideas, consider the Poisson-type error 
model Vs.r(€i)=f a (t:)<r 2 , in which f e (t) is large, at least 
during the initial portion of the sampling times, as is 
often the case in tracer measurements. For large f e (t;), 

log y t =log / fl (r,)+log{l h- 6,/Mt,)} 

xlogMti+^/iMtrf, (11) 

where r), = e,/(/o(f,0)^ has mean and constant variance 
or 1 . This suggests that/i(f f ) can be estimted with small 
relative error when feitj) is large. Therefore we intro- 
duce "ideal" weights of the form w? = l/max{/*#(r,-),C}, 
where C is some large constant, and define the actual 
weights 

w,= l/mas{/ fl (r / ) I C}, (12) 

where f e {t ) denotes some initial estimate of f g (t ) such 



that /»(/) is proportionally close to/»(r) if f 6 (t) is large. 
With this choice of weights w it we define the least 
squares function S*(X) by (10) and study its global and 
local properties by using both discretization and gra- 
dient methods. The idea is to partition the & -dimensional 
parameter space A (set of all possible values for X) into 
a finite number of subregions. The minimum S*(X D ) in 
each subregion D is Found by standard gradient-type 
(such as the Marquardt or Fletcher) algorithms. The 
minimum of S*(Xr>) over all subregions D then gives the 

least squares estimate X" (S*(H) — S*(X)). Moreover, 

those values of S*(X D ) that are proportionally close to 
S*(X) also give a range of models compatible with the 
data. 

Figure 1 illustrates the results of this analysis in the 
regression model 



v,= 100 e~'' + 1000 «-*' + €,, / = 1, ...,«, 



(13) 



where n = 50, r, = (0.0 1)*', and the e, are independent nor- 
mal random variables with zero means and 
Var(e,)=£>,. Here ^ = 1, X 2 =5, a t = 100 and ct 2 =l000 
are the unknown parameters, and /3 = is assumed 
known. The initial estimates / 9 (? r ) in the weights (12), 
where we set C=30, are obtained by using the Cornell- 
Lanczos estimate of &. Prior knowledge of the in- 
equality constraints 0<\,<4 and 4<\ 2 <10 is assumed, 
and we divide this parameter space into 24 unit squares. 
In 100 simulation experiments performed, we obtained 
results similar to those in figure 1. In figure 1 reporting 
one such simulation, S*(X D ) is shown inside each unit 
square D, near the minimizing point X D which is repre- 
sented by a small triangle. The solid triangles denote 
those X D whose S*(X D ) values lie within 10% of the 
minimum value, which is underlined in the figure. At the 
true parameter vector X=(l,5), S*(X)=51.S, which dif- 
fers from, the minimum value of 47.1 by about 9%. The 
curves represent the contour S*(X) — 52, so that S* lies 
within 10% of its minimum value in the region between 
the curves. 

The wide range of models compatible with the data in 
figure 1 is in sharp contrast to figure 2, where in addition 
to the data during the time interval [0,0.5] of figure 1 we 
took 75 additional observations (generated by the model 
(13)) at equally spaced times in the subsequent period 
(0.5,1.25). In figure 2 there is a relatively small range of 
parameter vectors X whose S*(X) values are near the 
underlined minimum value at the least squares estimate 
X\ which is remarkably close to true parameter X=(l,5). 

Regarding S(y) — S {&) as an estimate of the "informa- 

n 

tion distance" d (6, y) = '2w,\f »{[;)— f y {tt)f, we can use it 
to assess the compatibility of the model f r with the data. 
To illustrate this idea, consider the 14 models repre- 
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Figure 1, 



sented by solid triangles in figure 1, whose S* values lie 
10% of the minimum value of 47.1. We tabulate below 
S(y)—S(&) for each of these 14 values of y. In addition, 
the corresponding information distances d(9 t y) and 

rf*(0, r )=£[^-/ Y (/O]V/<,fe) are also tabulated for 
comparison. Note that d*(6,y) is remarkably close to 
d(0,y); moreover, d*(0,y) is so small (in good agree- 
ment with S(y)-S0)) that/ r (f) is within 2% of f e (t) 
over the observed time range 0.01 <f <0.5 in each of the 
14 cases. 

S( 7 >-S(§) diB.y) d*(9,y) S{y)-S(6> d(0,y) d*{tt,y) 






0.63 


0.63 


1.0 


2.2D 


2.24 


0.2 


3.1S 


3.25 


1.8 


1.58 


1.61 


4.0 


1.67 


1.69 


2.1 


2.80 


2.84 


0.5 


2.72 


2.78 


2.0 


2.11 


2.14 


1.1 


5.21 


5.18 


3.0 


1,67 


1.69 


0.8 


3.01 


3.06 


4.5 


0.06 


0.06 


1.3 


1.20 


1.23 


3.9 


0.67 


0.63 



In the case of normal N(0,1/h>i) random errors £,, 
exp{-l/2(5(y)-5(&))} is the generalized likelihood 
ratio for testing H Q :$=y. We can also construct con- 
fidence regions for by using contours of the function 



S(y), as will be shown elsewhere. In this connection, 
Bates and Watts [1] recently proposed another useful 
method involving parameter transformations to im- 
prove the standard asymptotic approximations for con- 
structing confidence regions. 

The separation of the full parameter vector 6 into its 
linear and nonlinear components is not only of com- 
putational interest, but it also has basic statistical impli- 
cations. Analogous to the preceding paragraph, in the 
case of independent JV(0,l/w,) random errors, 
exp{- 1/2 (S *<X)-,S *(£))} is the generalized like- 
lihood ratio for testing whether \ is the true vector of 
decay rate constants, with jS, a lt .... a* as the nuisance 
parameters. This idea can be easily extended to simulta- 
neous equations (2) defining £ -compartment models, 
where we have 



yJ(f)=Pv+ S^e'V+t/O, v=l, ..., k. 



We can similarly define 



S*{k)= 



mm 



(fiv,a vJ )\ 



tvJ'J 



2 ?,w v {t)[y v {t)- 



-ft-lcvr*]* 
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Figure 2. 
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and interpret exp{-l/2 (S*(X)-S *(£))} as a gener- 
alized likelihood ratio for testing whether X is the true 
vector of decay rate constants on the basis of data from 
all k compartments. 



3. Decay Rates and Their Estimation 

Consider the polyexponential regression model (4). 
As illustrated in figure 1, one often encounters a wide 
range of rate parameter vectors (X„ ..., X*) that are 
compatible with one's data. In such circumstances, since 
there is not enough information to estimate the individ- 
ual rate constants, it is more meaningful to consider 
them in a combined characteristic, such as the fractional 
rate of decay 



r(f)=lim S" 
8^0 



'{l -Mt +S)/Mt)}=-(d/dt) logMt) 



at different time points t of interest. Letting X =0 and 

a =(i, the logarithmic derivative o{fe(t)=2<Xje~ k '' is 
given by — 1 times 



/•(/)= 2 p,{t)\„ where p,(f)= a. t e ~ x ''/ 2 a, e ~ K ''. (14) 
;=,o /-o 

Thus, r(t) is a convex combination of the rate constants 
hi, with the proportional size of the /'* exponential term 
as the natural weighting factor for the decay rate X.. 
To estimate r(f) at a particular time point r within the 
range of sampling times, we propose to balance "global 
information" from all sampling times (leading to 
weighted least squares estimates described in sec. 2) 
with "local information" from only the sampling times 
near r. We start by using the method of section 2 on all 
the data to find a region A of rate parameters (Xi, . . ., X t ) 
that are compatible with the data. To choose a vector in 
A that will provide the best estimate of r(r), we note 
that the logarithmic derivative — r(r) is a "local" quan- 
tity involving only sampling times near t, and it is there- 
fore reasonable to weight the observations not only by 
their variability but also by how far their sampling times 
are from t, putting more weight on sampling times near 
r. With this new set of weights w,(t), we calculate the 
(constrained) least squares estimate 9(r) of the parame- 
ter vector 6, under the constraints XeA and ctj>0(/=0, 
1, ..., k). Substituting the unknown parameters in eq (14) 
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by these least squares estimates, we obtain the estimate 
?(r) of r(r). 

A detailed discussion of the procedure sketched 
above, together with a comparative study of this ap- 
proach and the popular curve-peeling methods For com- 
partmental analysis (cf. [12]), will be presented else- 
where. 



The author thanks Dr. Hung Chen for helpful dis- 



cussions. 
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One of the major tasks of marine geochemists is deter- 
mining the uptake by the sea of C0 2 derived from the 
combustion of fossil fuels. Until valid models of the 
general circulation of the ocean are constructed, this 
task will have to be done with box models calibrated 
through use of the distribution of natural radioisotopes 
and transient tracers. 

We need to explore how sensitive the uptake of fossil 
fuel C0 2 is to the basic design of these models and how 
the design can be improved by simultaneously fitting the 
distributions of several tracers. Five different 11-box 
thermocline circulation models of the temperate North 
Atlantic were constructed for this purpose.* Anthro- 
pogenic tritium, 3 He, and radiocarbon are used as trac- 
ers to calibrate these models. The temporal input func- 
tions of these tracers differ considerably from one 
another. So also do the geographic patterns of their 
inputs and their geochemistries in the sea. 

Using the basic equation of the box model [e.g., eq (1) 
of T.L. Lai's presentation at this conference] and the 
finite difference method of computation for mass bal- 



ance in each box, these thermocline ventilation models 
with differing circulation patterns were caUbrated to 
yield a tritium distribution similar to that observed dur- 
ing the Geochemical Ocean Section Studies 
(GEOSECS) survey in 1973. These models were then 
run for J He and bomb-produced H C. While the models 
differ significantly in their ability to match the observed 
3 He and U C distributions, these differences are not large 
enough to clearly single out one model as superior. This 
insensitivity of tracer to tracer ratio to model design is 
reflected by the nearly identical uptake of C0 2 by the 
various models. This result also suggests that the uptake 
of COi by the sea is limited more by the rates of physical 
mixing within the sea than by the rate of gas exchange 
across the sea surface. 



* Research of the application of box models for the geochemical 
modeling of oceans was supported jointly by the National Science 
Foundation's Ecosystems Studies Program under Interagency Agree- 
ment DEB 8115316 and the Carbon Dioxide Research Division, Of- 
fice of Energy Research, U.S. Department of Energy, under contract 
DE-AC05-S4OR21400 with Martin Marietta Energy Systems, Inc. 
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